Search CORE

3 research outputs found

All-rounder: A flexible DNN accelerator with diverse data format support

Author: Jang Yongjoo
Kung Jaeha
Lee Seungpyo
Noh Seock-Hwan
Park Sehun
Shin Banseok
Publication venue
Publication date: 25/10/2023
Field of study

Recognizing the explosive increase in the use of DNN-based applications, several industrial companies developed a custom ASIC (e.g., Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure with it. The ASIC performs operations of the inference or training process of DNN models which are requested by users. Since the DNN models have different data formats and types of operations, the ASIC needs to support diverse data formats and generality for the operations. However, the conventional ASICs do not fulfill these requirements. To overcome the limitations of it, we propose a flexible DNN accelerator called All-rounder. The accelerator is designed with an area-efficient multiplier supporting multiple precisions of integer and floating point datatypes. In addition, it constitutes a flexibly fusible and fissionable MAC array to support various types of DNN operations efficiently. We implemented the register transfer level (RTL) design using Verilog and synthesized it in 28nm CMOS technology. To examine practical effectiveness of our proposed designs, we designed two multiply units and three state-of-the-art DNN accelerators. We compare our multiplier with the multiply units and perform architectural evaluation on performance and energy efficiency with eight real-world DNN models. Furthermore, we compare benefits of the All-rounder accelerator to a high-end GPU card, i.e., NVIDIA GeForce RTX30390. The proposed All-rounder accelerator universally has speedup and high energy efficiency in various DNN benchmarks than the baselines

arXiv.org e-Print Archive

희소 행렬 곱셈 가속기의 하드웨어 향상에 대하여

Author: Banseok Shin
Publication venue: Daegu
Publication date: 21/03/2023
Field of study

SpGEMM, Distribution Network, Reduction Network, Data tilingDeep learning is being used and researched in various industries such as image processing, natural lan-guage processing, and recommendation algorithm service. Also, The size of the model is growing in tandem with deep learning technologies to increase accuracy. Additionally, sparse matrix multiplication is used in the majority of deep learning model operations. As a result, there is an increasing needs for accelerator research on sparse matrix multiplication. One of the accelerators that supports the sparse general matrix-matrix multi-plication (spGEMM) operation is SIGMA (A Sparse and Irregular GEMM Accelerator). However, each opera-tion network and index matching process of SIGMA is inefficient. We propose improvement measures in three aspects to solve these problems. First, the distribution network's redundant hardware modules are elimi-nated. When multiple flexdpe's are controlled by NoC (Network on chip), area and power can be by utilizing the use of a network where unnecessary parts are removed. Second, we suggest a brand-new architecture that solely uses the output flip-flop to store and compute the total of the partial sums of reduction networks. Fi-nally, we suggest that for quick operation processing, the sparsity of each matrix, the number of operation elements, and the matrix size be used as indicators for choosing an efficient partitioning approach utilizing a pre-calculated table as a look-up table. The total hardware area was decreased by roughly 21.8% and the power was decreased by 37.5% thanks to the proposed distribution and reduction network structure en-hancement. When using the LUT and tiling with 2, it is possible to reduce the clock cycle by around 80% when the stationary matrix's sparseness is 80% and the streaming matrix's sparseness is 99%.딥 러닝(Deep Learning)은 이미지 처리, 자연어 처리, 추천 알고리즘 서비스 등 다양한 산업 분야에서 활용 및 연구되고 있다. 또한 딥 러닝 모델의 정확도 향상을 위해 모델이 크기도 증가하고 있다. 딥 러닝 모델에서 대부분의 연산은 희소 행렬 곱셈이 차지한다. 따라서 희소행렬 곱셈에 대한 가속기 연구의 필요성이 커지고 있다. 우리는 희소행렬 곱셈 연산을 지원하는 가속기 중 하나인 SIGMA(A Sparse and Irregular GEMM Accelerator)의 3가지 측면에서 개선방안을 제시한다. 첫째, distribution network의 불필요한 하드웨어 구성 요소를 제거한다. 둘째, reduction network 의 partial sum의 합을 output flip-flop만을 사용하여 저장하고 연산하는 새로운 topology 를 제안한다. 마지막으로 빠른 연산 처리를 위해 각 행렬의 희소도, 연산 요소 개수, 행렬 크기를 지표로 하여 미리 계산한 table을 Look Up Table로 활용하여 효율적인 분할 방법을 선택하는 것을 제안한다. 제안한 distri-bution, reduction network 구조 개선을 통해 전체 하드웨어 면적은 약 21.8% 가 감소하였고 전력은 37.5%가 감소하였다. Stationary matrix의 희소도가 80%, streaming matrix의 희소도가 99%일 때 LUT를 보고 2로 tiling할 경우 약 80%의 clock cycle을 줄일 수 있다.Ⅰ. Introduction 1 Ⅱ. Background and Prior Work 4 2.1 Background 4 2.1.1 Multi-layer Perceptron (MLP) 4 2.1.2 Convolutional Neural Networks (CNN) 5 2.1.3 Transformer 6 2.2 Prior Works: Inner, Outer, Row-wise Product Based Accelerators 7 2.3 Prior Work: SIGMA 8 2.3.1 Dataflow Of SIGMA 8 2.3.2 Distribution Network 11 2.3.3 Reduction Network 12 Ⅲ. Proposed Sparse Accelerator Design 13 3.1 Distribution Network 13 3.2 Reduction Network 15 3.2.1 Reorganized adder tree 15 3.3 Data Tiling Strategy 16 Ⅳ. Evaluation 17 4.1 Methodology 17 4.2 Experimental Results 17 4.2.1 Area / Power Improvements 17 4.2.2 Performance Improvements 18 Ⅴ. Conclusion 19 References 20MasterdCollectio

DGIST Library Institutional Repository

Universal primers for rift valley fever virus whole-genome sequencing

Author: Eom Sujeong
Kim Kwan Woo
Kim Seil
Lee Banseok
Park Changwoo
Shin Donghoon
Yi Hana
Publication venue: Nature Research
Publication date: 01/01/2023
Field of study

Rift Valley fever (RVF) is a mosquito-borne zoonotic disease causing acute hemorrhagic fever. Accurate identification of mutations and phylogenetic characterization of RVF virus (RVFV) require whole-genome analysis. Universal primers to amplify the entire RVFV genome from clinical samples with low copy numbers are currently unavailable. Thus, we aimed to develop universal primers applicable for all known RVFV strains. Based on the genome sequences available from public databases, we designed eight pairs of universal PCR primers covering the entire RVFV genome. To evaluate primer universality, four RVFV strains (ZH548, Kenya 56 (IB8), BIME-01, and Lunyo), encompassing viral phylogenetic diversity, were chosen. The nucleic acids of the test strains were chemically synthesized or extracted via cell culture. These RNAs were evaluated using the PCR primers, resulting in successful amplification with expected sizes (0.8–1.7 kb). Sequencing confirmed that the products covered the entire genome of the RVFV strains tested. Primer specificity was confirmed via in silico comparison against all non-redundant nucleotide sequences using the BLASTn alignment tool in the NCBI database. To assess the clinical applicability of the primers, mock clinical specimens containing human and RVFV RNAs were prepared. The entire RVFV genome was successfully amplified and sequenced at a viral concentration of 108 copies/mL. Given the universality, specificity, and clinical applicability of the primers, we anticipate that the RVFV universal primer pairs and the developed method will aid in RVFV phylogenomics and mutation detection. © 2023, The Author(s).11Nsciescopu

IBS Publications Repository

Directory of Open Access Journals